This page last changed on Feb 02, 2009 by straha1.
Read the Tutorial

These instructions assume you've read the following two sections of the main tutorial:

Table of Contents

Selecting MVAPICH2

As with the other MPIs, you will need to use switcher to change to MVAPICH2:

switcher mpi = pgi-mvapich2-1.2p1

If you want, you can use the GCC version of mvapich2 instead. The instructions in this tutorial will still be correct. However, switcher handles gcc-mvapich2-1.2p1 correctly so you do not need to use the fix mentioned above when using the GCC version of MVAPICH2.

switcher mpi = gcc-mvapich2-1.2p1

Make sure you log out and log back in so that switcher will update your MPI settings.

Compiling

Now that you've used switcher to select MVAPICH2, you must recompile the hello_parallel program (or whatever program you're running). The commands used to compile code do not vary between the MPI implementations. For hello_parallel:

  • For C++: mpicxx hello_parallel.cc -o hello_parallel
  • For C: mpicc hello_parallel.c -o hello_parallel
  • For Fortran 77: mpif77 hello_parallel.f -o hello_parallel
  • For Fortran 90: mpif90 hello_parallel.f90 -o hello_parallel

Running

Running MVAPICH2 is different from the other MPI implementations (but much easier than it was in 2008 before the upgrade to OFED 1.4). Your mpirun command will be replaced with an mpirun_rsh command as shown in the below qsub script. The qsub script is the same for Fortran 77, C, C++ and Fortran 90.

#!/bin/bash
: The above line tells Linux to use the shell /bin/bash to execute
: this script.  That must be the first line in the script.

: You must have no lines beginning with # before these
: PBS lines other than the line with /bin/bash
#PBS -N 'hello_parallel'
#PBS -o 'qsub.out'
#PBS -e 'qsub.err'
#PBS -W umask=007
#PBS -q low_priority
#PBS -l nodes=5:ppn=4

: Change our current working directory to the directory from which you ran qsub:
cd $PBS_O_WORKDIR

# Execute hello_parallel using mpirun_rsh.  Note the {{-hostfile}}
# instead of the usual {{-machinefile}}. The 20 below should be
# replaced with the number of nodes times the number of processors
# you've requested per node.  In this case:
#    5 nodes * 4 processors/node = 20 processors

mpirun_rsh -np 20 -hostfile $PBS_NODEFILE ./hello_parallel

Your job's output, and the process of linking, submitting, monitoring or canceling your job should be exactly the same as with the other MPI implementations.

MVAPICH2: Issues and Advanced Usage

Less than Four Processes Per Machine

Usually you should use four processes per machine as we have done in this tutorial. Eventually, you might want to run less than four processes per machine (such as if you decide to use OpenMP). It isn't as simple as requesting fewer nodes in your -np option. Look here for instructions.

Environment Variables

MVAPICH2 does not forward environment variables to your programs (ie. hello_parallel) when it runs them on the compute nodes. Environment variables are things like your PATH and LD_LIBRARY_PATH. Your shell uses the PATH variable to find programs that you try to execute. Your programs require dynamic libraries – chunks of code that are shared between multiple programs (such as the implementation of C's printf or Fortran's write). Your programs use the LD_LIBRARY_PATH variable to find dynamic libraries that they need. For simple, self-contained programs like hello_parallel, you do not need to forward environment variables – the default values will suffice. If your programs complain that they cannot find libraries or other programs, then you will need to forward environment variables. See this page for details.

Bizarre Errors that Mention mpd

Before the OFED 1.4 upgrade, you had to use a program called mpd in order to run MVAPICH2 jobs. The mpd program was very flaky and led to all sorts of problems. We strongly reccomend you stop using mpd (the mpdboot, mpdallexit and mpiexec commands that were on this webpage before). Use the mpirun_rsh command instead.

Document generated by Confluence on Mar 31, 2011 15:37